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Abstract 

This study continued the research on analogy problem- solving 

on psychometric tests pursued by Be jar, Chaffin and Embretson 

(1991) . In specific, characteristics of a semantic taxonomy and a 

0 

cognitively and empirically motivated intensional/pragmatic (I/P) 
dichotomy were explored. There were two research questions: (1) 

Could Bejar et al . f s results be replicated with SAT items? and (2) 
Would factor analyses support the bidimensional processing 
structure suggested by the I/P distinction? A specially 
constructed test of disclosed SAT analogies was administered to a 
group of 189 undergraduate students. Though factor analyses did 
not support the expected bidimensionality, a better understanding 
of both the semantic taxonomy and the I/P dichotomy was achieved. 
Suggestions for future work were given. 
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The Dimensionality of Responses to SAT Analogy Items 
Research on analogical reasoning has at times emphasized 
processing (e.g. Sternberg, 1977) and knowledge (Whitely, 1976; 
Pellegrino £ ulaser, 1979) . Thus on one hand, within the purview 
of the "normative processor model" proposed by Bejar, Chaffin and 
Embretson ( 1991) for analogies of the form A:B: : ? : the 
individual is assumed to encode A, encode B, and postulate a 
relationship. Then, each alternative pair, CitDi, is encoded, its 
relationship induced, and then judged in terms of "most similar 
to" A: B’s relationship. On the other hand, Bejar et al. (1991), 
and Chaffin and Pierce (1987) proposed a semantic taxonomy to 
describe types of knowledge or schema that may be required to 
solve GRE items. They additionally presented a pragmatic- 
intensional dichotomy to explain empirical aspects of their data. 
Hence, the purpose of this study was twofold: 1) to replicate 
Bejar et al.’s 1991 findings with SAT items and 2) to check if the 
dimensionality of responses to SAT items was consistent with the 
pragmatic-intensional dichotomy. 

To provide some theoretical background, semantic taxonomies 
have been developed to portray "a limited number of relations . . . 
which can function as explanatory primitives (the most 'primitive' 
or basic set of relations, to which all others can be reduced; 

Sowa, 1984, p. 13) in associationist and network theories of 
mental function (Chaffin & Herrmann, 1988)". Table 1 shows a 
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taxonomy describing 179 GRE analogies (Bejar et al., 1991). At 
the very least, these relation categories draw from an examinee's 
semantic and syntactic knowledge bases. 

* 

Insert Table 1 about here 



However, when the difficulties of the semantic classes were 
examined (Bejar et al., 1991), it was found that the classes fell 
into two cluster types, called intensional and pragmatic. 
Analogies involving intensional relations were more difficult 
(class inclusion, similar, contrast, attribute, nonattribute), 
while those involving pragmatic relations were easier (cause- 
purpose, space-time, part-whole, representation) . Given this 
hierarchy, illustrated in Table 2, relations were assigned to a 
semantic class and thereby simultaneously placed into one of the 
two type categories. 



Insert Table 2 about here 



After an overall examination of the two clusters of 
relations, certain characteristics were observed. Intensional 
relations were based on a comparison of the attributes or 
properties of two concepts while pragmatic relations were based on 
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the co-occurrence of two things in the world. An example of this 
distinction (Bejar et al., 1991 p. 68) can be found in Figure 1. 



A farmer is by definition a person and a tractor is a vehicle. 

The relation between these word pairs is intensional because it 
rests purely on the comparison of two concepts. In contrast, the 
relation between a farmer and a tractor is pragmatic. It rests on 
the particular circumstances found in most technological 
societies . 

Since inducing relations between words is a creative, 
productive ability/skill, solving intensional or pragmatic 
analogies may impose different process requirements (Klix & van 
der Meer, 1980; Klix, van der Meer & Preufi, 1985; Rumelhart, 

1989) . Therefore, the way intensional and pragmatic items covary 
should confirm an intensional-pragmatic processing distinction. 

In specific, factor analyses of analogy item responses should 
confirm a very specific dimensionality — bidimensionality. 
Further, since different levels of analysis can highlight 
different structural features, factor analysis models reflecting 
this dimensionality constraint were run on a cluster level defined 
by the semantic classes, as well as an item level (Dorans & 
Lawrence, 1992; Wainer & Lewis, 1990) . 



Insert Figure 1 About Here 
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To this end, the SAT instrument and sample are described and 

outcomes are compared with previous GRE results. Since Be jar et 

al, f s results were replicated, confirmatory and exploratory factor 

analysis models were run to support the substantive hypothesis of 

0 

an intensional-pragmatic bidimensionality. 

Method 

Instrument 

An analogy test of 40 items was created from a pool of 399 
disclosed SAT analogy items. Two undergraduates, chosen for their 
high verbal ability, classified all items into the mutually 
exclusive classes of the taxonomy. Disagreements were resolved 
through discussion. Then, four items from each semantic class 
were selected such that, of the four items, one was high on the 
difficulty continuum, one low, and two fell at the middle 
difficulty quartiles, where difficulty was measured by the ETS 
delta statistic. Care was also taken that no words would appear 
more than once. 

Subjects 

The subjects consisted of 189 Trenton State undergraduate 
volunteers. There were 118 females and 70 males and most students 
(94%) were 22 years old or younger. All students, except four, 
spoke English as a first language. 
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The instrument was group administered by one of the authors 
as part of a class exercise and subjects were allowed as much time 
as they needed for the task. 

Results 

Descriptive Results 

In calculating scores, omitted items were scored as 
incorrect, for the following reasons: (1) response patterns showed 

items missing throughout the test, i,e. the patterns did not 
fulfill a typical expectation of many omits occurring at the end 
of the test, (2) students often got correct answers at the end of 
the test, even with many omits earlier on (This was not unexpected 
as the items were ordered by accession number instead of, as is 
usual, presenting them in order of difficulty.), (3) the two 
parameter IRT model, with no guessing parameter, fit best and (4) 
there was a .strong difficulty-number missing relationship (r=,71, 
p<.01), demonstrating that more difficult items, as independently 
measured by ETS 1 s item delta statistic, tended to be those with 
larger numbers of omits. It is unlikely that rights scoring would 
have much impact on these results. 

Hence a frequency distribution of total scores is shown in 
Figure 2. The distribution had a negative skew (Sk=-.603) and a 
mean of 25.7 out of a possible 40, indicating that the items may 
have been on the easy side for this sample. Note that the SAT 
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items are actually designed for a college bound, high school 
students population. 



In addition, analyses were carried out to see if some 
previous findings from Bejar et al.'s (1991) work were replicated 
with SAT items. For example, Bejar et al. found, as others have 
(Lord, 1975) , a substantial and persistent negative correlation 
<r=-.51) between item difficulty (delta) and item discrimination 
(r-biserial) for analogy items. However, since typically the 
criterion used to calculate the biserial correlation has been the 
SAT verbal score, some alternative approaches were considered 
here. First, using biserial correlations and delta statistics 
from TESTFACT, where the criterion was the analogy instrument 
administered, a negative correlation was still evident (r- -.55, 
p<.01). Figure 3 shows the biserial by delta scatterplot. 



Second, the relationship of other estimates of discrimination and 
difficulty, a, a measure proportional to the item characteristic 
curve (ICC), and b, the ICC’s point of inflection, was examined. 
The correlation between the a and b parameters from the best 



Insert Figure 2 about here 



Insert Figure 3 about here 
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fitting IRT model, a two parameter model, was still negative but 
not significantly different than zero (r=-.35), even though b and 
delta correlated .96 and a and r biserial correlated .95. 

Bejar et al. (1991) also found that items classified as 
intensional were more difficult and less discriminating than items 
classified as pragmatic, as indicated by mean ETS delta and r- 
biserial statistics. The data for this sample replicated this 
finding: (1) the mean number correct for pragmatic items was 13.87 
(sd=2 .97, out of 20 items), as compared to intensional items' mean 
of 11.82 (sd=3.67), (2) the distribution of pragmatic items were 

more negatively skewed (sk=-.78) than intensional ' s (sk=-.36), 
but, (3) while intensional items remained more difficult than 
pragmatic items, the class' rank ordering of the difficulty and 
discrimination changed somewhat, perhaps because when this 
instrument was constructed, an effort was made to include an 
entire spectrum of item difficulties. 1 

As analogy items are intended to measure analogical 
reasoning ability, another important concern is how much variance 
in item difficulty is due to vocabulary knowledge. Certainly, an 
examinee cannot hope to reason analogically if she does not know 
item word meanings; once the hurdle of vocabulary knowledge is 
surpassed, then analogical reasoning can begin. Therefore, it is 
expected that some part of the variance in item difficulty is due 
to vocabulary. This was the case for Bejar et al.'s 1991 study 
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where a significant 10% of the variance in delta was explained by 
stem and key word frequency (Four word frequencies, two for the 
stem and two for the key pairs, were collated from Kucera and 
Francis* (1967) text; the natural log of the minimum frequency of 



the present sample 12% of the variance was explained by stem and 
key word frequency (Kucera & Francis, 1967) but the relation was 
not significant, probably due to less power (N=40 items here 
versus N=179 in Bejar et al.’s study.). So SAT items do not seem 
to draw on vocabulary knowledge more or less than GRE items . 
Interestingly, if a variable taking into account the average word 
frequency of the alternatives is added to the regression equation, 
24% of the variance in delta was explained and the relation was 
significant (p< . 05 ) . 

These same regressions were also run for intensional (N=20 
items) and pragmatic items (N=20) separately, as Bejar et al. 
(1991) did. Here, only the average minimum frequency of the 
alternatives for intensional items explained difficulty; word 
frequency was not related to difficulty when looking at pragmatic 
items. However, these regression analyses had even less power. 

A priori Confirmatory Factor Analysis 

There were several structural models suggested by the above 
mentioned intensional/pragmatic distinction 2 . Model I was a 
direct and uncomplicated picture of two correlated factors: 



a pair’s two words was then chosen to represent each pair.). In 
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intensional and pragmatic clusters/items loaded on their 
corresponding factors- Model II had a hierarchical form with an 
overall exogenous Proficiency factor explaining the correlation 
between two endogenous factors (intensional and pragmatic) . While 
theoretically relevant, this model could not be identified with 
only two endogenous factors (Rindskopf & Rose, 1988) . Model III 
sought to portray the idea that every concept is at its center 
intensional, or definitional, but some also have pragmatic 
characteristics. Note that Model III was a bi-factor analysis 
(Gibbons & Hedeker, in press). Please see Figures 4, 5 and 6. 



Insert Figures 4, 5 and 6 about here 



These models were tested at both the cluster and item level. 
Unfortunately, the limitations of the available software and the 
assumptions of factor analysis dictated possible approaches. A 
cluster level analysis did not violate factor analysis 
assumptions; four items per cluster formed continuous variables 
and traditional LISREL analyses could be run. However, factor 
analyses on item level data are notoriously problematic (For a 
review, see Dorans & Lawrence, 1992) . Hence, TESTFACT was used to 
calculate smoothed tetrachoric matrices, with a statistical 
guarantee of being positive definite (Bock, Gibbons & Muraki, 

1988) . Unfortunately, TESTFACT had limited capabilities for 
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confirmatory models. For example, to identify the models, 

TESTFACT assumed that all factors were un correlated. 

Additionally, it was only possible to allow loadings to be free or 

fix them equal to zero. Therefore, LISREL also was used to 

* 

examine item level confirmatory analyses by importing the smoothed 
tetrachoric matrices. 

Moreover, problems due to a large "number of items to total 
number of students" (40 items to 179 students) ratio necessitated 
subset analyses for item level confirmatory models in LISREL. 
Consequently, for Model I and III, specific subsets of the data 
were further probed. Since the class inclusion and case relation 
classes were the theoretically most pristine class exemplars of 
intensional and pragmatic properties (Chaffin, 1992, personal 
communication), the models were rerun with just these eight items. 
Also, since item difficulty has historically provided a recurrent 
methodological theme (Dorans & Lawrence, 1992), additional subsets 
of items were surveyed. A subset of extreme items, hard and easy, 
called hilo, were scrutinized. Easy and hard items may draw on 
different skills for people with varying ability levels, or, 
methodological factors may be revealed. In addition, the items 
lying at the inner quartiles of difficulty, called middle, were 
considered more stable class representatives. 
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Cluster Level Results 

As a first step, a one factor model was checked out: it fit. 

Since the hypothesized models were postulated a priori, the 

outcomes were looked at, even though it would be hard to justify 

* 

accepting a less parsimonious solution ^ Model I fit, but the 
factors correlated r=.97. Models II and III were rejected due to 
estimation and identification problems (e.g. negative variances) . 
Some of these difficulties were undoubtedly caused by the high 
intercorrelation between factors. Unidimensionality was again 
verified using Bejar's approach (1980). That is, b estimates for 
the whole test were plotted with b estimates for the intensional 
and pragmatic subtests; this scatterplot is shown in Figure 7. If 
the test data were unidimensional, points should lie primarily on 
a straight line going through 0 and with a slope of 1, which was 
the case . 



In addition, the two parameter model fit best for both subtests as 
well as the entire test. 

For this sample and set of analogy items, a one factor 
solution was deemed most descriptive, presenting a rare and 
informative case of unidimensionality at the cluster level. 



Insert Figure 7 about here 
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Perhaps different kinds of relations reflect a coherence in mental 
structure leading a unidimensional test* 

Item Level Results 

Model I was run on the case relation/class-inclusion and 

# 

middle subsets of data at the item level* Following the same 
reasoning described above, a one factor solution fit best for 
middle items* Further, despite high expectations. Model I was 
rejected for case relation/class-inclusion and middle items. For 
the hilo data, the hypothesis of methodological or difficulty 
factors was rejected. 

Exploratory Factor Analyses 
Cluster Level Results 

To check out if a two dimensional model could possibly fit, 
clusters of four items were randomly grouped together. Indeed, 
though meaningless theoretically, a bidimensional model could be 
supported. This finding strengthens the impact of finding a 
unidimensional data set when the items were clustered by semantic 
classes. 

Item Level Results 

In contrast, exploratory analyses of all 40 items with 
TESTFACT indicated that an item level, one factor model did not 
fit; the one factor solution was rejected for all sets of data. 

In fact, except for the case relations/class inclusion data 
subset, the two factor solution significantly improved fit over 
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the one factor model, evaluated using change in x 2 statistics. 
Clearly, at the item level, something more was going on. Only one 
interpretation seemed sensible. In the unrotated two factor 
solution for the entire test, a main overall factor dominated, 
perhaps like g, while the loadings for the second factor, with 
negative and positive values, correlated significantly with delta 
(r=.79, p<.01). This provided weak support for a method factor 
(Dorans & Lawrence, 1992) . This was the case despite rejection of 
the confirmatory models using the hilo subset of data. 

Discussion 

A review of cognitive research and recent empirical results 
using GRE items led to the hypothesis that analogy items falling 
into an intensional and pragmatic dichotomy should reflect, 
through a bi -dimensional factor structure, their respective 
cognitive processing requirements. This study examined this 
issue. Further, some of Bejar et al.’s 1991 results were checked 
for replication on SAT items. 

Initial descriptive results showed that SAT items functioned 
similarly to GRE. A negative correlation between delta and 
rbiseriai / appearing consistently over many analogy item sets, 
persisted here as well. Further, intensional items were more 
difficult than pragmatic. This occurred despite an effort in 
instrument development to keep a consistent spectrum of difficulty 
over classes. These results suggested that certain 



Dimensionality and SAT Analogies 

16 

characteristics of intensional items created more difficult items. 
In addition, vocabulary levels for stem and key pairs did not seem 
to contribute more or less to item difficulty for these SAT items 
as compared to the GRE's. 

However, confirmatory and exploratory analyses lead to 
unexpected outcomes. By clustering items according to class, the 
test remained unremittingly unidimensional. Yet, using another 
tactic to cluster items, though the clusters were not 
theoretically relevant, did allow for two factors. Further, two 
factors fit better than one in item level exploratory analyses. 

So the test was not in all forms unidimensional. The cohesiveness 
of the clusters may be a way for test developers to design 
theoretically, rather than statistically, unidimensional analogy 
tests. These results also lend credence to this taxonomy being 
appropriate for SAT items too. 

While these results were not supportive of the expected bi- 
dimensionality, there were several differences between the two 
studies that might account for this outcome. The sample tested 
was not drawn from a traditional SAT population. These students 
had emerged as undergraduates from a much larger group of 
'possibly college bound students'. They had presumably further 
matured and learned. Hence, the items seemed to be easier for 
this sample. Also, no time constraints were imposed. In 
addition, these particular analogies were drawn from an older set 
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of items with some historic problems. Since that time, 
distractors have been written differently. Perhaps the noted 
importance of the alternatives in predicting item difficulty was 
related to past problems. 

It may also be that factor analytic tools are not sensitive 
enough to p irk up fine processing differences on analogy item 
types of the form A: B: : ? : ? , Whitely and Schneider (1981) 
mentioned three possible reasons for results of this nature: (1) 

there may not be individual differences in processing abilities on 
these two dimensions, (2) these abilities may not have been 
reliably measured, and (3) the two distinct processing abilities 
may be highly correlated. So theory need not be rejected on these 
grounds alone. 

Nonetheless, theoretical issues should still be questioned. 
It may be that the procedure of categorizing items into classes, 
which automatically placed them into the type dichotomy, was not 
appropriate. Just because an item's relation is part-whole, for 
example, does not mean it was treated by the processor as 
pragmatic. In fact, when the items were post hoc reclassified by 
one of the authors as intensional or pragmatic, independently of 
the taxonomy, item difficulty was significantly explained (R 2 =.25, 
p<.001). Further, certain stem relations, when considered within 
the context of the alternatives, may be treated by the processor 
as primarily intensional or pragmatic. Researchers (Bejar et al.. 
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1991; Barnes, 1980) have long noted that multiple choice 
alternatives affect the way an item rationale is formulated. 
Future work with this dichotomy must involve a separate 
categorization of the taxonomy and dichotomy and a classification 
methodology for the dichotomy that takes into account alternative 
choice context effects. 
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Footnotes 

1 The effort to control for difficulty met with moderate 
success. The midpoints of delta at the 25 th and 75 th quartiles 
ranged from 8.14 to 13.48; if difficulty had been tightly 
controlled, this range would have been -'smaller. 

2 Be jar et al. (1991) could not check dimensionality as only 
item statistics were available. 
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Table 1 

Semantic Taxonomy of Relations 



Relation 


Example 




Rationale 


Class Inclusion robin:bird 


A 


is 


a member of class B 


Part-Whole 


engine: car 


A 


is 


a part of B 


Similar 


breeze:gale 


B 


is 


a more intense A 


Contrast 


default :pay 


A 


is 


the opposite of B 


Attribute 


beggar :poor 


B 


is 


an attribute of A 


Non-Attribute 


ha rmony : di s co rd 


B 


is 


not an attribute 






of A 




Case Relation 


tailor : suit 


A 


works on B 


Cause/Purpose 


hunger : eat 


A 


is 


the cause of B 


Space/Time 


judge: court 


A 


can be found in B 




summer : harvest 


B 


occurs during A 


Representation 


building : print 


B 


is 


a representation of 






A 







27 

ERIC 



Dimensionality and SAT Analogies 

25 



Table 2 

The Intensional/ Pragmatic and Semantic Class Hierarchy 

* 

Relation Dichotomy 


Intensional 


Pragmatic 


Class Inclusion 


Case Relation 


Similar 


Cause-Purpose 


Contrast 


Space-Time 


Attribute 


Part-Whole 
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Representation 
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Figure 1 

Intensional/Pragmatic Example 
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Note . From Cognitive and Psychometric Analysis of Analogical Problem 
Solving (p. 68 ) by I. I. Bejar, R. Chaffin, and S. E. Embretson, 
1991, New York: Springer-Verlag. Copyright 1991 by Springer-Verlag. 
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